Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actorcritic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.
translated by 谷歌翻译
In this paper, a hyperparameter tuning based Bayesian optimization of digital twins is carried out to diagnose various faults in grid connected inverters. As fault detection and diagnosis require very high precision, we channelize our efforts towards an online optimization of the digital twins, which, in turn, allows a flexible implementation with limited amount of data. As a result, the proposed framework not only becomes a practical solution for model versioning and deployment of digital twins design with limited data, but also allows integration of deep learning tools to improve the hyperparameter tuning capabilities. For classification performance assessment, we consider different fault cases in virtual synchronous generator (VSG) controlled grid-forming converters and demonstrate the efficacy of our approach. Our research outcomes reveal the increased accuracy and fidelity levels achieved by our digital twin design, overcoming the shortcomings of traditional hyperparameter tuning methods.
translated by 谷歌翻译
及时,准确地检测功率电子中的异常,对于维持复杂的生产系统而变得越来越重要。强大而可解释的策略有助于减少系统的停机时间,并抢占或减轻基础设施网络攻击。这项工作从解释当前数据集和机器学习算法输出中存在的不确定性类型开始。然后引入和分析三种打击这些不确定性的技术。我们进一步介绍了两种异常检测和分类方法,即矩阵曲线算法和异常变压器,它们是在电源电子转换器数据集的背景下应用的。具体而言,矩阵配置文件算法被证明非常适合作为检测流时间序列数据中实时异常的概括方法。迭代矩阵配置文件的结构python库实现用于创建检测器。创建了一系列自定义过滤器并将其添加到检测器中,以调整其灵敏度,回忆和检测精度。我们的数值结果表明,通过简单的参数调整,检测器在各种故障场景中提供了高精度和性能。
translated by 谷歌翻译
本文着重于根据数据包输送比率(PDR)(即,在远程广阔的区域(Lorawan)中通过End Devices(EDS)发送)的数据包数量来改善资源分配算法。设置传输参数会显着影响PDR。我们采用强化学习(RL)提出了一种资源分配算法,该算法使ED可以以分布式方式配置其传输参数。我们将资源分配问题建模为多臂强盗(MAB),然后通过提出一种名为Mix-MAB的两相算法来解决它,该算法由探索和开发(EXP3)和连续消除(SE)组成,该算法由指数重量组成(SE)算法。我们通过仿真结果评估混合MAB性能,并将其与其他现有方法进行比较。数值结果表明,就收敛时间和PDR而言,所提出的解决方案的性能优于现有方案。
translated by 谷歌翻译